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Joint  Sparse  Representation  for  Robust  Multimodal 

Biometrics  Recognition 

Sumit  Shekhar,  Student  Member,  IEEE,  Vishal  M.  Patel,  Member,  IEEE,  Nasser  M.  Nasrahadi,  Eellow,  IEEE, 

and  Rama  Chellappa,  Eellow,  IEEE 


Abstract — Traditional  biometric  recognition  systems  rely  on  a 
single  biometric  signature  for  authentication.  While  the  advan¬ 
tage  of  using  multiple  sources  of  information  for  establishing  the 
identity  has  been  widely  recognized,  computational  models  for 
multimodal  biometrics  recognition  have  only  recently  received  at¬ 
tention.  We  propose  a  multimodal  sparse  representation  method, 
which  represents  the  test  data  by  a  sparse  linear  combination  of 
training  data,  while  constraining  the  observations  from  different 
modalities  of  the  test  subject  to  share  their  sparse  representations. 
Thus,  we  simultaneously  take  into  account  correlations  as  well  as 
coupling  information  among  biometric  modalities.  We  modify  our 
model  so  that  it  is  robust  to  noise  and  occlusion.  A  multimodal 
quality  measure  is  also  proposed  to  weigh  each  modality  as  it  gets 
fused.  Furthermore,  we  also  kernelize  the  algorithm  to  handle 
non-linearity  in  data.  The  optimization  problem  is  solved  using  an 
efficient  alternative  direction  method.  Various  experiments  show 
that  our  method  compares  favorably  with  competing  fusion-based 
methods. 

Index  Terms — Multimodal  biometrics,  feature  fusion,  sparse 
representation. 

I.  Introduction 

Unimodal  biometric  systems  rely  on  a  single  source  of 
information  such  as  a  single  iris  or  fingerprint  or  face  for 
authentication  [1].  Unfortunately  these  systems  have  to  deal 
with  some  of  the  following  inevitable  problems  [2];  (a)  Noisy 
data;  poor  lighting  on  a  user’s  face  or  occlusion  are  examples 
of  noisy  data,  (b)  Non-universality:  the  biometric  system 
based  on  a  single  source  of  evidence  may  not  be  able  to 
capture  meaningful  data  from  some  users.  For  instance,  an 
iris  biometric  system  may  extract  incorrect  texture  patterns 
from  the  iris  of  certain  users  due  to  the  presence  of  contact 
lenses,  (c)  Intra-class  variations:  in  the  case  of  fingerprint 
recognition,  presence  of  wrinkles  due  to  wetness  [3]  can 
cause  these  variations.  These  types  of  variations  often  occur 
when  a  user  incorrectly  interacts  with  the  sensor,  (d)  Spoof 
attack;  hand  signature  forgery  is  an  example  of  this  type  of 
attack.  It  has  been  observed  that  some  of  the  limitations  of 
unimodal  biometric  systems  can  be  addressed  by  deploying 
multimodal  biometric  systems  that  essentially  integrate  the 
evidence  presented  by  multiple  sources  of  information  such 
as  iris,  fingerprints  and  face.  Such  systems  are  less  vulnerable 
to  spoof  attacks  as  it  would  be  difficult  for  an  imposter  to 
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simultaneously  spoof  multiple  biometric  traits  of  a  genuine 
user.  Due  to  sufficient  population  coverage,  these  systems  are 
able  to  address  the  problem  of  non-universality. 

Classification  in  multibiometric  systems  is  done  by  fus¬ 
ing  information  from  different  biometric  modalities.  The 
information  fusion  can  be  done  at  different  levels,  which 
can  be  broadly  divided  into  feature  level,  score  level  and 
rank/decision  level  fusion.  Due  to  preservation  of  raw  in¬ 
formation,  feature  level  fusion  can  be  more  discriminative 
than  score  or  decision  level  fusion  [4].  But,  there  have  been 
very  few  efforts  in  exploring  feature  level  fusion  in  the 
biometric  community.  This  is  because  of  the  differences  in 
features  extracted  from  different  sensors  in  terms  of  type  and 
dimensions.  Often  the  features  have  large  dimensions,  and 
fusion  becomes  difficult  at  the  feature  level.  The  prevalent 
method  is  feature  concatenation,  which  has  been  used  for 
different  multibiometric  settings  [5]-[7].  However,  for  high¬ 
dimensional  feature  vectors,  simple  feature  concatenation  may 
be  inefficient  and  non-robust.  A  related  work  in  the  machine 
learning  literature  is  of  Multiple  Kernel  Learning  (MKL), 
which  aims  to  integrate  information  from  different  features 
by  learning  a  weighted  combination  of  respective  kernels.  A 
detailed  survey  of  the  methods  for  MKL  can  be  found  in 
[8].  However,  for  multimodal  systems,  weight  determination 
during  testing  is  important,  based  on  the  quality  of  different 
modalities.  Such  a  framework  is  not  feasible  in  MKL  setting. 
Methods  like  [9],  [10]  try  to  exploit  information  from  labeled 
and  unlabeled  data  from  a  different  view  to  improve  classifier 
performance.  Similarly,  SVM-2k  [11]  jointly  learns  SVM  for 
two  views.  But,  these  methods  are  difficult  to  generalize  to 
multimodal  setting,  as  common  in  biometric  fusion.  A  Fisher 
discriminant  analysis  based  method  has  also  been  proposed  for 
integrating  multiple  views  [12],  but  it  is  also  similar  to  MKL 
with  kernel  Fisher  discriminant  analysis  as  the  base  learner 

[13]. 

In  recent  years,  theories  of  Sparse  Representation  (SR)  and 
Compressed  Sensing  (CS)  have  emerged  as  powerful  tools 
for  efficient  processing  of  data  in  non-traditional  ways  [14]. 
This  has  led  to  a  resurgence  in  interest  in  the  principles 
of  SR  and  CS  for  biometrics  recognition  [15].  Wright  et 
al.  [16]  proposed  the  seminal  sparse  representation-based 
classification  (SRC)  algorithm  for  face  recognition.  It  was 
shown  that  by  exploiting  the  inherent  sparsity  of  data,  one 
can  obtain  improved  recognition  performance  over  traditional 
methods  especially  when  the  data  is  contaminated  by  various 
artifacts  such  as  illumination  variations,  disguise,  occlusion 
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Fig.  1:  Overview  of  our  algorithm. 


and  random  pixel  corruption.  Pillai  et  al.  extended  this  work 
for  robust  cancelable  iris  recognition  in  [17].  Nagesh  and  Li 
[18]  presented  an  expression-invariant  face  recognition  method 
using  distributed  CS  and  joint  sparsity  models.  Patel  et  al.  [19] 
proposed  a  dictionary-based  method  for  face  recognition  under 
varying  pose  and  illumination.  A  discriminative  dictionary 
learning  method  for  face  recognition  was  also  proposed  by 
Zhang  and  Li  [20].  For  a  survey  of  applications  of  SR  and  CS 
algorithms  to  biometric  recognition,  see  [14],  [15],  [21],  [22] 
and  the  references  therein. 

Motivated  by  the  success  of  SR  in  unimodal  biomet¬ 
ric  recognition,  we  propose  a  joint  sparsity-based  algorithm 
for  multimodal  biometrics  recognition.  Figure  1  presents  an 
overview  of  our  framework.  It  is  based  on  the  well  known 
regularized  regression  method,  multi-task  multi-variate  Lasso 
[23],  [24].  Our  method  imposes  common  sparsities  both  within 
each  biometric  modality  and  across  different  modalities.  Note 
that  our  method  is  different  from  some  of  the  previously  pro¬ 
posed  classification  algorithms  based  on  joint  sparse  represen¬ 
tation.  For  example.  Yuan  and  Yan  [25]  proposed  a  multi-task 
sparse  linear  regression  model  for  image  classification.  This 
method  uses  group  sparsity  to  combine  different  features  of 
an  object  for  classification.  Zhang  et  al.  [26]  proposed  a  joint 
dynamic  sparse  representation  model  for  object  recognition. 
Their  essential  goal  was  to  recognize  the  same  object  viewed 
from  multiple  observations  i.e.,  different  poses.  Our  method 
is  more  general  in  that  it  can  deal  with  both  multi-modal  as 
well  as  multi-variate  sparse  representations. 

This  paper  makes  the  following  contributions; 

•  We  present  a  robust  feature  level  fusion  algorithm  for 
multibiometric  recognition.  Through  the  proposed  joint 
sparse  framework,  we  can  easily  handle  different  dimen¬ 


sions  of  different  modalities  by  forcing  different  features 
to  interact  through  their  sparse  coefficients.  Furthermore, 
the  proposed  algorithm  can  efficiently  handle  large  di¬ 
mensional  feature  vectors. 

«  We  make  the  classification  robust  to  occlusion  and  noise 
by  introducing  an  error  term  into  the  optimization  frame¬ 
work. 

•  The  algorithm  is  easily  generalizable  to  handle  multiple 
test  inputs  from  a  modality. 

«  We  introduce  a  quality  measure  for  multimodal  fusion 
based  on  the  joint  sparse  representation. 

«  Lastly,  we  kernelize  the  algorithm  to  handle  non-linearity 
in  the  data  samples. 

A  preliminary  version  of  this  work  appeared  in  [27],  which 
describes  just  the  linear  version  of  the  algorithm,  robust 
to  noise  and  occlusion.  Furthermore,  extensive  experimental 
evaluations  are  presented  here. 

A.  Paper  Organization 

The  paper  is  organized  as  follows.  In  section  II,  we  describe 
the  proposed  sparsity-based  multimodal  recognition  algorithm 
which  is  kernelized  in  section  IV.  The  quality  measure  is 
described  in  III.  Experimental  evaluations  on  a  comprehensive 
multimodal  dataset  and  a  face  database  are  described  in 
section  V.  Finally,  in  section  VI,  we  discuss  the  computational 
complexity  of  the  method.  Concluding  remarks  are  presented 
in  section  VII. 

IT  Joint  sparsity-based  multimodal  biometrics 

RECOGNITION 

Consider  a  multimodal  C-class  classification  problem  with 
D  different  biometric  traits.  Suppose  there  are  pi  training 
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samples  in  each  biometric  trait.  For  each  biometric  trait  class  label  associated  with  an  observed  vector  is  then  declared 
i  =  1, . . . ,  D,  we  denote  as  the  one  that  produces  the  smallest  approximation  error. 


x*  =  [xi,x*,...,xy 


as  an  rii  x  pi  dictionary  of  training  samples  consisting  of  C 
sub-dictionaries  X^’s  corresponding  to  C  different  classes. 
Each  sub-dictionary 


represents  a  set  of  training  data  from  the  ith  modality  labeled 
with  the  jth  class.  Note  that  rii  is  the  feature  dimension  of  each 
sample  and  there  are  pj  number  of  training  samples  in  class  j. 
Hence,  there  are  a  total  of  p  =  Pj  samples  in  the 
dictionary  X^.  Elements  of  the  dictionary  are  often  referred 
to  as  atoms.  In  multimodal  biometrics  recognition  problem, 
given  a  test  samples  (matrix)  Y,  which  consists  of  D  different 
modalities  {Y^ ,  Y^, . . . ,  Y^}  where  each  sample  Y*  consists 
of  di  observations  Y*  =  [y^,  y^, . . . ,  y^  G  the  objec¬ 

tive  is  to  identify  the  class  to  which  a  test  sample  Y  belongs 
to.  In  what  follows,  we  present  a  multimodal  multivariate 
sparse  representation-based  algorithm  for  this  problem  [23], 
[24],  [28]. 


A.  Multimodal  multivariate  sparse  representation 

We  want  to  exploit  the  joint  sparsity  of  coefficients  from 
different  biometric  modalities  to  make  a  joint  decision.  To 
simplify  this  model,  let  us  consider  a  bi-modal  classification 
problem  where  the  test  sample  Y  =  [Y^ ,  Y^]  consists  of  two 
different  modalities  such  as  iris  and  face.  Suppose  that  Y^ 
belongs  to  the  jth  class.  Then,  it  can  be  reconstructed  by  a 
linear  combination  of  the  atoms  in  the  sub-dictionary  Xj.  That 
is,  Y^  =  X^r^  -b  N^,  where  is  a  sparse  matrix  with  only 
Pj  nonzero  rows  associated  with  the  jth  class  and  is  the 
noise  matrix.  Similarly,  since  Y^  represents  the  same  subject, 
it  belongs  to  the  same  class  and  can  be  represented  by  training 
samples  in  X|  with  different  set  of  coefficients  r|.  Thus,  we 
can  write  Y  =  X  T  -l-N  ,  where  T  is  a  sparse  matrix  that 
has  the  same  sparsity  pattern  as  T^.  If  we  let  T  =  [r^,r^], 
then  r  is  a  sparse  matrix  with  only  pj  non-zero  rows. 

In  the  more  general  case  where  we  have  D  modalities, 
if  we  denote  as  a  set  of  D  observations  each 

consisting  of  di  samples  from  each  modality  and  let  T  = 
[r^,  r^, . . . ,  r^]  G  be  the  matrix  formed  by  concate¬ 

nating  the  coefficient  matrices  with  d  =  ^fLidi,  then  we 
can  seek  for  the  row-sparse  matrix  T  by  solving  the  following 
-regularized  least  square  problem 

1  ^ 

f  =  argmm  -  ^  ||Y*  -  X*r|||  +  A||r||i.,  (1) 

^  i=l 

where  A  is  a  positive  parameter  and  q  is  set  greater  than  1  to 
make  the  optimization  problem  convex.  Here,  ||r||  i  ^  is  a  norm 
defined  as  ||r||i.q  =  Ylk=i  Il7^llg  where  7^’s  are  the  row 
vectors  of  T  and  ||Y||i7’  is  the  Erobenius  norm  of  the  matrix 
Y  defined  as  ||Y||;’  =  Once  E  is  obtained,  the 


D 

j  =  argmin^||V-X*5}(r)||^,  (2) 

^  i=i 

where  <5*  is  the  matrix  indicator  function  defined  by  keeping 
rows  corresponding  to  the  jth  class  and  setting  all  other  rows 
equal  to  zero.  Note  that  the  optimization  problem  (1)  reduces 
to  the  conventional  Lasso  [29]  when  0  =  1  and  d  =  1.  In 
the  case,  when  O  =  1  (1)  is  referred  to  as  multivariate  Lasso 
[23]. 


B.  Robust  multimodal  multivariate  sparse  representation 

In  this  section,  we  consider  a  more  general  problem  where 
the  data  is  contaminated  by  noise.  In  this  case,  the  observation 
model  can  be  modeled  as 

Y*  =  XT* -f  Z* -b  i  =  (3) 

where  N*  is  a  small  dense  additive  noise  and  Z*  G 
is  a  matrix  of  background  noise  (occlusion)  with  arbitrarily 
large  magnitude.  One  can  assume  that  each  Z*  is  sparsely 
represented  in  some  basis  B*  G  That  is,  Z*  =  B*A* 

for  some  sparse  matrices  A*  G  Hence,  (3)  can  be 

rewritten  as 

Y*  =  XT* -bB*A* -bN*,  i  =  (4) 

With  this  model,  one  can  simultaneously  recover  the  coef¬ 
ficients  r*  and  A*  by  taking  advantage  of  the  fact  that  A*  are 
sparse 

1  ^ 

r,  A  =  argmin  -  ^  ||Y*  -  X*r*  -  B*A*||^  + 

Ai||r||i,,  +  A2||A||i,  (5) 

where  Ai  and  A2  are  positive  parameters  and  A  = 
[A^,  A^, . . . ,  A^j  is  the  sparse  coefficient  matrix  correspond¬ 
ing  to  occlusion.  The  £i-norm  of  matrix  A  is  defined  as 
||A||i  =  Note  that  the  idea  of  exploiting  the 

sparsity  of  occlusion  term  has  been  studied  by  Wright  et  al. 
[16]  and  Candes  et  al.  [30]. 

Once  r,  A  are  computed,  the  effect  of  occlusion  can  be 
removed  by  setting  Y*  =  Y*  —  B*A*.  One  can  then  declare 
the  class  label  associated  to  an  observed  vector  as 

D 

j  =  argmin^  ||Y*  -  X*<5*(r*)  -  (6) 

^  i=i 

C.  Optimization  algorithm 

Optimization  problem  (5)  is  convex  but  difficult  to  solve  due 
to  the  joint  sparsity  constraint.  In  this  section,  we  present  an 
approach  based  on  the  classical  alternating  direction  method 
of  multipliers  (ADMM)  [31],  [32].  Note  that  the  optimization 
problem  (1)  can  be  solved  by  setting  A2  equal  to  zero.  Let 

1  ^ 

C(r,  A)  =  -  ^  ||Y*  -  x*r*  -  B*A*f^. 
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Then,  our  goal  is  to  solve  the  following  optimization  problem 
minC(r,  A)  +  Aillrlli,,  +  A2||A||i.  (7) 

In  ADMM  the  idea  is  to  decouple  C(r,  A),  ||r||i_g  and  ||A||i 
by  introducing  auxiliary  variables  to  reformulate  the  problem 
into  a  constrained  optimization  problem 

^  mm^C(r,A)  +  Ai||V||i,g  +  A2||U||i  s.  t. 

r  =  V,A  =  U.  (8) 

Since,  (8)  is  an  equally  constrained  problem,  the  Augmented 
Lagrangian  method  (ALM)  [31]  can  be  used  to  solve  the 
problem.  This  can  be  done  by  minimizing  the  augmented 
Lagrangian  function  /Q,p_Q,^(r,  A,  V,  U;  Aa,  Ar)  defined  as 

C(r,  A)  +  A2IIUII1  +  (Aa,  A  -  U)  +  ^l|A  -  lJ\\j.+ 

Ai||V||i,g  +  (Ar,r  -  V)  +  ^||r  -  V||^,  (9) 

where  Aa  and  Ap  are  the  multipliers  of  the  two  linear 
constraints,  and  aA,Q:r  are  the  positive  penalty  parameters. 
The  ALM  algorithm  solves  far, oca  (r,  A,  V,  U;  Aa,  Ap)  with 
respect  to  F,  A,  U  and  V  jointly,  keeping  Ap  and  Aa  fixed 
and  then  updating  Ap  and  Aa  keeping  the  remaining  variables 
fixed.  Due  to  the  separable  structure  of  the  objective  function 
far,ocAc  one  can  further  simplify  the  problem  by  minimizing 
far, OCA  respect  to  variables  F,  A,  U  and  V,  separately. 
Different  steps  of  the  algorithm  are  given  in  Algorithm  1. 
In  what  follows,  we  describe  each  of  the  sub-optimization 
problems  in  detail. 

Algorithm  1;  Alternating  Direction  Method  of  Multipliers 
(ADMM). _ 

Initialize:  Fq,  Uq,  Vq,  Aa.o,  Ap^o,  “r,  «a 

While  not  converged  do 

1-  Ft+i  =  argminr  /q;p,qa  (r,  At,  Ut,  Vt;  Ap,*,  Aaa) 

2.  At+i  =  argminA/ar,aA(rt+l,  A,  Ut,  Vt;  Ap,t,  AA,t) 

3.  Ut+i  =  argminu /ap,aA(rt+l,  At+i,U,  Vt;  Ap^t,  AA,t) 

4.  Vt+i  =  argminv/c.r,c.A(rt-|-l,  At+i,Ut+i,V;  Ap_t,AA,t) 

5.  Ap^t+i  =  Ap_t  +  OACFt+i  —  Ut+i) 

6-  AA,t-i-i  =  AA,t  -I-  opjrt+i  —  Vt+i) 


1 )  Update  step  for  F.-  The  first  sub-optimization  problem 
involves  the  minimization  of  /cp^jjA  (r,  A,  V,  U;  Aa,  Ap) 
with  respect  to  F.  It  has  the  quadratic  structure,  which  is  easy 
to  solve  by  setting  the  first-order  derivative  equal  to  zero. 
Furthermore,  the  loss  function  C(F,  A)  is  a  sum  of  convex 
functions  associated  with  sub-matrices  F*,  one  can  seek  for 
F(_|_p,  i  =  1, . . . ,  D,  which  has  the  following  solution 

Fj+i  =  +  apI)-i(X*"(Y*  -  Aj)  +  ctpVj  + 

where  I  is  p  x  p  identity  matrix  and  AJ,VJ  and  Ayj  are 
sub-matrices  of  At,Vt  and  Ay.t,  respectively. 

2 )  Update  step  for  A:  The  second  sub-optimization  prob¬ 
lem  is  similar  in  nature,  whose  solution  is  given  below 

Aj+i  =  (1  +  aA)-'(Y*  -  XTJ+i  +  aAU*  -  A),,*), 

where  UJ  and  Aa  j  are  sub-matrices  of  U*  and  Aa,*,  respec¬ 
tively. 


3 )  Update  step  for  U.'  The  third  sub-optimization  problem 
is  with  respect  to  U,  which  is  the  standard  minimization 
problem  which  can  be  recast  as 

min  i||A,+i  +  aX'AA.t  -  U|||,  +  — ||U||i.  (10) 

U  l  CtA 

Equation  (10)  is  the  well-known  shrinkage  problem  whose 
solution  is  given  by 

Ut+i  =  S  ^At_|_i  -|-  cva^Aa,*,  ’ 

where  S{a,b)  =  sgn{a){\a\  —  b)  for  |a|  >  b  and  zero 
otherwise. 

4)  Update  step  for  V.-  The  final  suboptimization  problem 
is  with  respect  to  V  which  can  be  reformulated  as 

min  i||Ft+i  +  af  ^Ap.t  -  V|||  +  ^||V||i.g.  (11) 

V  Z  Q!p 

Due  to  the  separable  structure  of  (11),  it  can  be  solved  by 
minimizing  with  respect  to  each  row  of  V  separately.  Let 
7j  ap_i_t  and  be  rows  of  matrices  Ft+i,Ap_t  and 

Vj+i,  respectively.  Then  for  each  i  =  1, . . .  ,p  we  solve  the 
following  sub-problem 

Vi,t+i  =argmini||z-v||^-bp||v||g,  (12) 

V  Z 

where  z  =  —  ap  AiOip^  and  p  =  One  can  derive  the 

solution  for  (12)  for  any  q.  In  this  paper,  we  only  focus  on 
the  case  when  q  =  2.  The  solution  of  (12)  has  the  following 


where  (v)+  is  a  vector  with  entries  receiving  values 
max(uj,  0). 

Our  proposed  Sparse  Multimodal  Biometrics  Recognition 
(SMBR)  method  is  summarized  in  Algorithm  2.  We  refer  to 
the  robust  method  taking  sparse  error  into  account  as  SMBR-E 
(SMBR  with  error),  and  the  initial  case  where  it  is  not  taken 
account  as  SMBR- WE  (SMBR  without  error). 

Algorithm  Sparse  Multimodal  Biometrics  Recognition 
(SMBR). _ 

Input:  Training  samples  test  sample  {YijlLi,  Occlusion 

basis 

Procedure:  Obtain  F  and  A  by  solving 
1  ^ 

T,  A  =  argmin-  !|  Y‘-X“F“ -B‘ A' |||,-bAi  ||F||i., +A2 1|  A||i , 

i=l 

Output: 

identity(Y)  =  argminj  lY*  -  X*d5(f‘)  -  B*A1|,. 


III.  Quality  based  fusion 

Ideally  a  fusion  mechanism  should  give  more  weights  to 
the  more  reliable  modalities.  Hence,  the  concept  of  quality 
is  important  in  multimodal  fusion.  A  quality  measure  based 
on  sparse  representation  was  introduced  for  faces  in  [16].  To 
decide  whether  a  given  test  sample  has  good  quality  or  not. 
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its  Sparsity  Concentration  Index  (SCI)  was  calculated.  Given 
a  coefficient  vector  7  S  the  SCI  is  given  as: 

C.  maxigfi,...  ,c}ll'5i(')')l|l 
SCIij)  =  — — 

where,  6i  is  the  indicator  function  keeping  the  coefficients 
corresponding  to  the  class  and  setting  others  to  zero.  SCI 
values  close  to  1  correspond  to  the  case  where  the  test  sample 
can  be  represented  well  using  the  samples  of  a  single  class, 
hence  is  of  high  quality.  On  the  other  hand,  samples  with  SCI 
close  to  0  are  not  similar  to  any  of  the  classes,  and  hence  are 
of  poor  quality.  This  can  be  easily  extended  to  the  multimodal 
case  using  the  joint  sparse  representation  matrix  F.  In  this 
case,  we  can  define  the  quality,  g*  for  sample  y®  as: 

q]  =  SC  I  it]) 

^  ^  i 

where,  T'j  is  the  j  column  of  T  .  Given  this  quality  measure, 
the  classification  rule  (2)  can  be  modified  to  include  the  quality 
measure. 

D  di 

j  =  argminj^^  g^lly^  -  X*^j  (r;)||^,  (13) 

^  i=l  fc=l 

where,  dj  is  the  indicator  function  retaining  the  coefficients 
corresponding  to  class. 

IV.  Kernel  space  multimodal  biometrics 

RECOGNITION 

The  class  identities  in  the  multibiometric  dataset  may  not 
be  linearly  separable.  Hence,  we  also  extend  the  sparse  multi¬ 
modal  fusion  framework  to  kernel  space.  The  kernel  function, 
K  :  R"  X  K”,  is  defined  as  the  inner  product 

k(x*,Xj)  = 

where,  (j)  is  an  implicit  mapping  projecting  the  vector  x  into 
a  higher  dimensional  space. 

A.  Multivariate  kernel  sparse  representation 

Considering  the  general  case  of  D  modalities  with 
as  a  set  of  di  observations,  the  feature  space  representation  can 
be  written  as: 

$(Y*)  =  [^(yl),0(y*),...,0(yj,)] 

Similarly,  the  dictionary  of  training  samples  for  modality  i  = 
1,  •  •  •  ,D  can  be  represented  in  feature  space  as 

As  in  joint  linear  space  representation,  we  have: 

$(Y*)  =  $(x*)r* 

where,  F®  is  the  coefficient  matrix  associated  with  modality 
i.  Incorporating  information  from  all  the  sensors,  we  seek  to 
solve  the  following  optimization  problem  similar  to  the  linear 
case: 

1  ° 

f  =  argmm  -  J]  ||#(Y®)  -  $(X®)F®|||  +  A||F||i,,  (14) 


where,  F  =  [F^,  F^,  •  ■  •  ,  F^].  It  is  clear  that  the  information 
from  all  modalities  are  integrated  via  the  shared  sparsity 
pattern  of  the  matrices  This  can  be  reformulated  in 

terms  of  kernel  matrices  as: 

1  D 

f  -  argm^in-  E  (trace(F®^Kx,.x,F®) 

^  i=l 

-2trace(Kx,.Y.r®)) +A||F||i.,  (15) 

where,  the  kernel  matrix  Ka,b  is  defined  as: 

KA,B(^,J)  =  (</>(aO,<^(bJ))  (16) 

and  being  and  columns  of  A  and  B  respectively. 

B.  Optimization  Algorithm 

Similar  to  the  linear  fusion  method,  we  apply  the  alternating 
direction  method  to  efficiently  solve  the  problem  for  kernel 
fusion.  The  method  splits  the  variable  F  such  that  the  new 
problem  has  two  convex  functions.  This  is  done  by  introducing 
a  new  variable  V  and  reformulating  the  problems  (15)  and  (??) 
as: 

Nk 

argmm-  ^  (trace(r"Kx.,x^F®)  -  2trace(KxLY^r®)) 

+  A||V||i,,s.t.F  =  V  (17) 

where,  Nk  is  the  number  of  kernels  in  (15)  and  (??). 
Rewriting  the  problem  using  the  Lagrangian  multiplier,  the 
optimization  problem  becomes: 

argmm-  (trace(F®'^Kx.,x-r®)  -  2trace(Kxi.Y>r®)) 

'  ^  2=1 

+  A||V||i,,  +  (B,F  -  V)  +  ^||F  -  V||^  (18) 
which  upon  re-arranging  reduces  to: 

1  ^  T 

argmm-  y]  (trace(F®  Kx.,x>r®)  -  2trace(Kx7Y>r®)) 

'  2=1 

+  A||V||i,,  +  ^||F-V+^B||^  (19) 

The  optimization  method  is  summarized  in  Algorithm  3.  It 
should  be  pointed  out  that  each  step  has  a  simple  closed-form 
expression. 

Algorithm  3;  Alternating  Direction  Method  of  Multipliers 
(ADMM)  in  kernel  space. _ 

Initialize:  Fq,  Vq,  Bq, 

While  not  converged  do 

1.  Ft+i  =  argminr  ^  Eill  (trace(r®^Kx7xirO  - 

2trace(Kx.,Y.rO)  Ai|Vt||i,,  +  ^\\T~Vt  +  ||^ 

2.  Vt+i  =  argminv  A||V||i,,  -E  ^||rt+i  -  Vt  +  7^Bt||^ 

3.  Bt+i  =  Bt  -E  —  Vt+i) 


1)  Update  steps  for  F(.-  Fj+i  is  obtained  by  updating  each 
sub-matrix  FJ,  i  =  1,  •  •  •  ,  Nk  as: 

rj  =  (Kx.,x^  +  ^evI)-'(Kx7y^  +  PwVl  -  B®)  (20) 

where,  I  is  an  identity  matrix  and  VJ,  BJ  are  sub-matrices 
of  Vi  and  Bt  respectively. 
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2)  Update  steps  for  Vt:  The  update  equation  for  Vt  is 
same  as  in  the  linear  fusion  case  using  (11)  and  (12),  replacing 
Ar.t  and  or  with  Bj  and  Pw  respectively. 


occlusion  on  the  performance  of  different  algorithms.  In  all  the 
experiments  was  set  to  be  identity  for  convenience,  i.e.,  we 
assume  background  noise  to  be  sparse  in  image  domain. 


C.  Classification 

Once  r  is  obtained,  classification  can  be  done  by  assigning 
the  class  label  as: 

Nk 

j  =  argmin^  ||$(Y* *)  -  ^(X;.)f;.||| 

^  i=i 

or  in  terms  of  kernel  matrices  as: 

Nk 

j  =  argmin^  (trace(KYY)  —  2trace(f*  Kx^Yr}) 

^  i— 1 

+  trace(ffKxixif}))  (21) 

Here,  X*  is  the  sub-dictionary  associated  with  class  and 
r®  is  the  coefficient  matrix  associated  with  this  class. 

The  classification  rule  can  be  further  extended  to  include 
the  quality  measure  as  in  (13).  But,  we  skip  this  step  here, 
as  we  wish  to  study  the  effect  of  kernel  representation  and 
quality  separately. 

Multivariate  Kernel  Sparse  Recognition  (kerSMBR)  algo¬ 
rithm  is  summarized  in  Algorithm  4: 


Algorithm  4:  Kernel  Sparse  Multimodal  Biometrics  Recogni¬ 
tion  (kerSMBR). 

Input:  Training  samples  test  sample 

Procedure:  Obtain  F  by  solving 

1  ° 

r  =  argmin  -  ^  ||$(Y®)  -  $(X®)r®|||.  -E  A||r||i,,  (22) 

•  1 
1  =  1 

Output:  identity(Y)  =  argminj  JZlLi  (trace(KYY)  — 

2trace(ff  )  -E  trace(ff  )) 


V.  Experiments 

We  evaluated  our  algorithm  on  two  publicly  available 
datasets  -  the  WVU  Multimodal  dataset  [33]  and  the  AR  face 
dataset  [34].  In  the  first  experiment,  we  tested  on  the  WVU 
dataset,  which  is  one  of  the  few  publicly  available  datasets 
which  allows  fusion  at  image  level.  It  is  a  challenging  dataset 
consisting  of  samples  from  different  biometric  modalities  for 
each  subject. 

In  the  second  experiment,  we  show  the  applicability  of  our 
method  to  fusing  information  from  weak  biometrics  extracted 
from  face  images.  In  particular,  the  periocular  region  has 
been  shown  to  be  a  useful  biometric  [35].  Similarly,  the  nose 
region  has  also  been  explored  as  a  biometric  [36].  Sinha  et  al 
[37]  have  demonstrated  that  eyebrows  are  important  for  face 
recognition.  However,  each  of  these  sub-regions  may  not  be 
as  discriminative  as  the  whole  face.  The  challenge  for  fusion 
algorithms  is  to  be  able  to  combine  these  weak  modalities 
with  a  strong  modality  based  on  the  whole  face  [38].  We 
demonstrate  how  our  framework  can  be  extended  to  address 
this  problem.  Further,  we  also  show  the  effects  of  noise  and 


A.  WVU  Multimodal  Dataset 

The  WVU  multimodal  dataset  is  a  comprehensive  collection 
of  different  biometric  modalities  such  as  fingerprint,  iris, 
palmprint,  hand  geometry  and  voice  from  subjects  of  different 
age,  gender  and  ethnicity  as  described  in  Table  I.  It  is  a 
challenging  dataset  as  many  of  these  samples  are  corrupted 
with  blur,  occlusion  and  sensor  noise  as  shown  in  Figure  2.  Out 
of  these,  we  chose  iris  and  fingerprint  modalities  for  testing  the 
proposed  algorithms.  In  total,  there  are  2  iris  (right  and  left  iris) 
and  4  fingerprint  modalities.  Also,  the  evaluation  was  done  on 
a  subset  of  219  subjects  having  samples  in  both  modalities. 


Fig.  2:  Examples  of  challenging  images  from  the  WVU 
Multimodal  dataset.  The  images  shown  above  suffer  from 
various  artifacts  such  as  sensor  noise,  blur  and  occlusion. 


Biometric  Modality 

#  of  subjects 

#  of  samples 

Iris 

244 

3099 

Fingerprint 

272 

7219 

Palm 

263 

683 

Hand 

217 

3062 

Voice 

274 

714 

TABFE  I:  WVU  Biometric  Data 


1)  Preprocessing:  Robust  pre-processing  of  images  was 
done  before  feature  extraction.  Iris  images  were  segmented 
using  the  method  proposed  in  [39].  Following  the  segmen¬ 
tation  step,  25  X  240  iris  templates  were  generated  by  re¬ 
sampling  using  the  publicly  available  code  of  Masek  et  al. 
[40].  Fingerprint  images  were  enhanced  using  the  filtering 
methods  described  in  [41],  and  then  the  core  point  was 
detected  from  the  enhanced  images  [42].  Features  were  then 
extracted  around  the  detected  core  point. 

2)  Feature  Extraction:  Gabor  features  were  extracted  from 
the  processed  images  as  they  have  been  shown  to  give  good 
performance  on  both  fingerprints  [42]  and  iris  [43].  For 
fingerprint  samples,  the  processed  images  were  convolved  with 
Gabor  filters  at  8  different  orientations.  Circular  tessellations 
were  extracted  around  the  core  point  for  all  the  filtered  images 
similar  to  [42].  The  tessellation  consisted  of  15  concentric 
bands,  each  of  width  5  pixels  and  divided  into  30  sectors.  The 
mean  values  for  each  sector  were  concatenated  to  form  the 
feature  vector  of  size  3600  x  1.  Features  for  iris  images  were 
formed  by  convolving  the  templates  with  a  log-Gabor  filter  at 
a  single  scale,  and  vectorizing  the  template  to  give  a  6000  x  1 
dimensional  feature. 
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CMC  Curve  for  Individual  Biometrics  using  SLR 


(c) 


CMC  Curve  for  Individual  Biometrics  using  SVM 


Fig.  3;  CMCs  (Cumulative  Match  Curve)  for  individual  modalities  using  (a)  SMBR-E,  (b)  SMBR-WE,  (c)  SLR  and  (d)  SVM 
methods. 


CMC  Curves  for  fingerprint  fusion 


CMC  Curves  for  Iris  fusion 


(a) 


CMC  Curves  for  fusion  of  all  modalities 


SMBR-E 

SMBR-WE  ^ 

SLR-Sum 

SVM-Sum 

SLR-Major 

SVM-Major. 

MKLFusion 


(C) 


Eig.  4:  CMCs  for  multimodal  fusion  using  (a)  four  fingerprints,  (b)  two  irises  and  (c)  all  modalities. 
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Finger  1 

Finger  2 

Finger  3 

Finger  4 

Iris  1 

Iris  2 

SMBR-WE 

SMBR-E 

SLR 

SVM 

68.1  ±  1.1 

67.1  ±  1.0 
67.4  ±  1.9 

41.1  ±  5.0 

88.4  ±  1.2 

87.9  ±0.8 
87.9  ±  1.3 

75.5  ±  2.2 

69.2  ±  1.5 

67.4  ±1.9 
66.0  ±2.2 

49.2  ±1.6 

87.5  ±  1.5 

86.9  ±  1.5 
87.5  ±  1.3 
67.0  ±8.3 

60.0  ±  1.5 
62.5  ±  1.2 
57.1  ±3.0 
44.3  ±  1.2 

62.1  ±  0.4 
64.3  ±1.0 
57.9  ±2.7 
45.0  ±2.9 

TABLE  II:  Rank  one  recognition  performance  for  individual  modalities. 


SMBR-WE 

SMBR-E 

SLR-Sum 

SLR-Major 

SVM-Sum 

SVM-Major 

MKLEusion 

4  Fingerprints 

2  Irises 

All  modalities 

97.9  ±0.4 
76.5  ±  1.6 
98.7  ±  0.2 

97.6  ±0.6 
78.2  ±1.2 

98.6  ±0.5 

96.3  ±0.8 
72.7  ±4.0 
97.6  ±0.4 

74.2  ±  0.7 

64.2  ±2.7 
84.4  ±0.9 

90.0  ±2.2 
62.8  ±2.6 
94.9  ±  1.5 

73.0  ±  1.5 

49.3  ±2.0 

81.3  ±  1.7 

86.2  ±  1.2 

76.8  ±2.5 

89.8  ±0.9 

TABLE  III:  Rank  one  recognition  performance  for  the  WVU  Multimodal  dataset. 


SMBR-WE 

SMBR-E 

SLR-Sum 

SLR-Major 

SVM-Sum 

SVM-Major 

4  Fingerprints 

2  Irises 

All  modalities 

98.2  ±0.5 
76.9  ±  1.2 
98.8  ±0.4 

98.1  ±  0.5 
78.8  ±1.7 
98.6  ±0.3 

97.5  ±0.5 

74.1  ±  1.0 

98.2  ±0.2 

86.3  ±0.6 
67.2  ±2.4 
93.8  ±0.9 

93.6  ±1.6 
64.3  ±3.3 
95.5  ±  1.5 

85.5  ±0.9 

51.6  ±2.0 
93.3  ±  1.2 

TABLE  IV:  Rank  one  recognition  performance  using  the  proposed  quality  measure. 


3 )  Experimental  Set-up:  The  dataset  was  randomly  divided 
into  4  training  samples  per  class  (1  sample  here  is  1  data 
sample  each  from  6  modalities)  and  the  remaining  519  samples 
were  used  for  testing.  The  recognition  result  was  averaged  over 
5  runs.  The  proposed  methods  were  compared  with  state-of- 
the-art  classification  methods  such  as  sparse  logistic  regression 
(SLR)  [44]  and  SVM  [45].  As  these  methods  cannot  handle 
multiple  modalities,  we  explored  score-level  and  decision- 
level  fusion  methods  for  combining  the  results  of  individual 
modalities.  Eor  score-level  fusion,  the  probability  outputs  for 
test  sample  of  each  modality,  were  added  together 

to  give  the  final  score  vector.  Classification  was  based  upon 
the  final  score  values.  Eor  decision-level  fusion,  the  subject 
chosen  by  the  maximum  number  of  modalities  was  taken 
to  be  from  the  correct  class.  We  further  compared  with  the 
efficient  multiclass  implementation  of  MKL  algorithm  [46]. 
The  proposed  linear  and  kernel  fusion  techniques  were  tested 
separately  and  compared  them  with  the  linear  and  kernel 
versions  of  SLR,  SVM  and  MKL  algorithms.  We  denote  the 
score-level  fusion  of  these  methods  as  SLR-Sum  and  SVM- 
Sum,  and  the  decision-level  fusion  as  SLR-Major  and  SVM- 
Major.  MKL  based  method  is  denoted  as  MKLEusion.  We 
report  mean  and  standard  deviation  of  rank  one  recognition 
rates  for  all  the  methods.  We  also  show  Cumulative  Match 
Curves  (CMCs)  for  all  the  classifiers.  CMC  is  a  performance 
measure  for  biometric  recognition  systems  and  has  been  shown 
to  be  equivalent  to  ROC  of  the  system  [47]. 

a)  Linear  Fusion:  The  recognition  performances  of 
SMBR-WE  and  SMBR-E  was  compared  with  linear  SVM  and 
linear  SLR  classification  methods.  The  parameters  Ai  and  A2 
were  set  to  0.01. 

*  Comparsion  of  Methods:  Eigure  3  and  Table  II  show  the 
performance  on  individual  modalities.  All  the  classifiers 
show  a  similar  trend.  The  performance  for  all  of  them  are 
lower  on  iris  images  and  fingers  1  and  3.  The  proposed 
method  show  superior  performance  on  all  the  modalities. 
Eigure  4  and  Table  III  show  the  recognition  perfor¬ 
mance  for  different  fusion  settings.  The  proposed  SMBR 
approach  outperforms  existing  classification  techniques. 


Both  SMBR-E  and  SMBR-WE  have  similar  performance, 
though  the  latter  seems  to  give  a  slightly  better  perfor¬ 
mance.  This  may  be  due  to  the  penalty  on  the  sparse  error, 
though  the  error  may  not  be  sparse  in  the  image  domain. 
Eurther,  sum-based  fusion  shows  a  superior  performance 
over  voting-based  methods.  MKL  based  method  shows 
good  performance  for  iris  fusion,  but  the  performance 
drops  for  other  two  settings.  This  may  be  because  by 
weighing  kernels  during  training,  it  loses  flexibility  while 
testing  when  number  of  modalities  increase. 

•  Fusion  with  quality:  Clearly,  different  modalities  have 
different  levels  of  performance.  Hence,  we  studied  the  ef¬ 
fect  of  the  proposed  quality  measure  on  the  performance 
of  different  methods.  Eor  a  consistent  comparison,  the 
quality  values  produced  by  SMBR-E  method  was  used 
for  all  the  algorithms.  Table  IV  shows  the  performance 
for  the  three  fusion  settings.  The  effect  of  including 
the  quality  measure  can  be  studied  by  comparing  with 
Table  III.  Clearly,  the  recognition  rate  increases  for  all 
the  methods  across  the  fusion  settings.  Again  SMBR-E 
and  SMBR-WE  give  the  best  performances  among  all  the 
methods. 

•  Effect  of  joint  sparsity:  We  also  studied  the  effect  of  joint 
sparsity  constraint  on  the  recognition  performance.  Eor 
this,  SMBR-WE  algorithm  was  run  for  different  values 
of  Ai.  Eigure  5  shows  the  rank  one  recognition  variation 
across  Ai  values  for  different  fusion  settings.  All  the 
curves  show  a  sharp  increase  in  performance  around 
Ai  =  0.  Eurther,  the  increase  is  more  for  iris  fusion, 
which  shows  around  5%  improvement  at  Ai  =  0.005 
over  Ai  =  0.  This  shows  that  imposing  joint  sparsity 
constraint  is  important  for  fusion.  Moreover,  it  helps  in 
regulating  fusion  performance,  when  the  reconstruction 
error  alone  is  not  sufficient  to  distinguish  between  dif¬ 
ferent  classes.  The  performance  is  then  stable  across  Ai 
values,  and  starts  decreasing  slowly  after  reaching  the 
optimum  performance. 

•  Variation  with  number  of  training  samples:  We  varied 
the  number  of  training  samples  and  studied  the  effect 
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Rank  one  recognition  across  sparsity 


Fig.  5:  Variation  of  recognition  performance  with  different 
values  of  sparsity  constraint,  Ai. 


on  the  top  four  algorithms.  Figure  6  shows  the  vari¬ 
ation  for  fusion  of  all  the  modalities.  It  can  be  seen 
that  SMBR-WE  and  SMBR-E  are  stable  across  number 
of  training  samples,  whereas  the  performance  of  SLR 
and  MKLEusion  based  methods  fall  sharply.  The  fall  in 
performance  of  SLR  and  MKLEusion  can  be  attributed  to 
the  discriminative  approaches  of  these  methods,  as  well 
as  score-based  fusion,  as  the  fusion  further  reduces  the 
recognition  performance  when  individual  classifiers  are 
not  good. 


Rank  one  recognition  across  number  of  training  samples 


Eig.  6:  Variation  of  recognition  performance  with  number  of 
training  samples. 

•  Comparison  with  other  score-based  fusion  methods:  Al¬ 
though  sum-based  fusion  is  a  popular  technique  for  score 
fusion,  some  other  techniques  have  also  been  proposed. 
We  evaluated  the  performance  of  likelihood-based  fusion 
method  proposed  in  [48].  The  results  are  shown  in  Table 
V.  The  method  does  not  show  good  performance  as  it 
models  score  distribution  as  Gaussian  Mixture  Model. 
However,  it  is  difficult  to  model  score  distribution  due 
to  large  variations  in  data  samples.  The  method  is  also 
affected  by  the  curse  of  dimensionality. 


2  irises 

4  fingerprints 

All  modalities 

SLR-Likelihood 

66.6 

83.5 

75.1 

SVM-Likelihood 

50.7 

31.9 

31.0 

TABLE  V :  Eusion  performance  with  likelihood-based  method 
[48]. 


b)  Kernel  Fusion:  We  further  compared  the  perfor¬ 
mances  of  proposed  kerSMBR  with  kernel  SVM,  kernel  SLR 
and  MKLEusion  methods.  In  the  experiments,  we  used  Radial 
Basis  Eunction  (RBE)  as  the  kernel,  given  as: 


c(xi,Xj)  =  exp  (  - 


^.lli 


(T  being  a  parameter  to  control  the  width  of  the  RBE.  Eor 
MKLEusion,  we  gave  linear,  polynomial  and  RBE  kernels  as 
the  base  kernels  for  learning. 

•  Hyperparameter  tuning:  To  fix  the  value  of  hyper¬ 
parameter,  (7,  we  iterated  over  different  values  of  a, 
{2“^,  2“^,  •  •  •  ,  2^}  for  one  set  of  training  and  test  split 
of  the  data.  The  value  of  cr  giving  the  maximum  perfor¬ 
mance  was  fixed  for  each  modality,  and  the  performance 
was  averaged  over  a  few  iterations.  The  weights  {aij} 
were  set  to  1  for  composite  kernel.  A  and  (3w  were  set 
to  0.01  and  0.01  respectively. 

•  Comparison  of  methods:  Eigure  7  shows  the  perfor¬ 
mance  of  different  methods  on  individual  modalities, 
and  Eigure  8  and  Table  VII  on  different  fusion  settings. 
Comparison  of  performance  with  linear  fusion  shows 
that  the  proposed  kerSMBR  significantly  improves  the 
performance  on  individual  iris  modalities  as  well  as  iris 
fusion.  The  performance  on  fingerprint  modalities  are 
similar,  however  the  fusion  of  all  6  modalities  (2  iris  +  4 
fingerprints)  shows  an  improvement  of  0.4%.  kerSMBR 
also  achieves  the  best  accuracy  among  all  the  methods 
for  different  fusion  settings.  kerSLR  scores  better  than 
kerSVM  in  all  the  cases,  and  it’s  accuracy  is  close  to 
kerSMBR.  The  performance  of  kerSLR  is  better  than  the 
linear  counterpart,  however  kerSVM  does  not  show  much 
improvement. 


Eig.  9:  Eace  mask  used  to  crop  out  different  modalities. 


B.  AR  Face  Dataset 

The  AR  face  dataset  consists  of  faces  with  varying  illumi¬ 
nation,  expression  and  occlusion  conditions,  captured  in  two 
sessions.  We  evaluated  our  algorithms  on  a  set  of  100  users. 
Images  from  the  first  session,  7  for  each  subject  were  used 
as  training  and  the  images  from  the  second  session,  again 
7  per  subject,  were  used  for  testing.  Eor  testing  the  fusion 
algorithms,  four  weak  modalities  were  extracted  from  the  face 
images:  left  and  right  periocular,  mouth  and  nose  regions.  This 
was  done  by  applying  rectangular  masks  as  shown  in  Eigure  9, 
and  cropping  out  the  respective  regions.  These,  along  with  the 
whole  face,  were  taken  for  fusion.  Simple  intensity  values  were 
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CMC  Curve  for  Individual  Biometrics  using  kernel  SVM 


CMC  Curve  for  Individual  Biometrics  using  kernel  SLR 
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(a) 


(b) 


CMC  Curve  for  Individual  Biometrics  with  kerSMBR 


(C) 

Fig.  7:  CMCs  for  individual  modalities  using  (a)  kernel  SVM,  (b)  kernel  SLR  and  (c)  kerSMBR. 


CMC  Curve  for  fingerprint  fusion  CMC  Curve  for  iris  fusion 


(a) 


(b) 


(c) 

Fig.  8:  CMCs  for  different  fusion  methods  for  (a)  four  fingerprints,  (b)  two  irises  and  (c)  all  modalities.  Results  for  composite 
kernels  using  different  techniques  is  shown  in  figure  (d). 


Finger  1 

Finger  2 

Finger  3 

Finger  4 

Iris  1 

Iris  2 

kerSMBR 

kerSLR 

kerSVM 

66.3  ±  1.7 

65.8  ±  1.8 

48.4  ±5.4 

87.1  ±  1.0 

86.9  ±  1.7 
76.7  ±2.3 

69.1  ±2.1 

68.3  ±  2.0 

50.2  ±  1.9 

86.4  ±1.5 

89.5  ±  1.6 
68.4  ±  7.4 

70.3  ±  1.8 

65.1  ±  1.7 
43.9  ±  1.1 

71.0  ±  1.6 

66.8  ±  1.1 
44.6  ±3.0 

TABLE  VI;  Rank  one  recognition  performance  for  individual  modalities  using  kernel  methods. 
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kerSMBR 

kerSLR-Sum 

kerSLR-Major 

kerSVM-Sum 

kerSVM-Major 

MKLFusion 

4  Fingerprints 

2  Irises 

All  modalities 

97.9  ±0.3 
84.7  ±  1.7 
99.1  ±0.2 

96.8  ±0.7 
83.7  ±  1.8 

98.9  ±0.1 

75.2  ±0.7 
75.2  ±  1.2 
87.9  ±0.6 

93.2  ±  1.2 

62.2  ±  2.8 
96.3  ±0.8 

71.4  ±  1.3 
47.82.4 

79.5  ±  1.6 

88.7  ±0.9 
76.9  ±2.4 
91.2  ±  1.0 

TABLE  VII;  Rank  one  recognition  performance  for  different  fusion  settings  using  kernel  methods. 


used  as  features  for  all  of  them.  The  experimental  set-up  was 
similar  to  the  previous  section.  The  parameter  values,  Ai  and 
A2  were  set  to  0.003  and  0.002  respectively.  Furthermore,  we 
also  studied  the  effect  of  noise  and  occlusion  on  recognition 
performance. 

•  Comparison  of  methods:  Table  VIII  shows  the  perfor¬ 
mance  of  different  algorithms  on  the  face  dataset.  Here, 
SR  (sparse  representation)  shows  the  classihcation  result 
using  just  the  whole  face.  Block  Sparse  Method  is  a 
recent  block  sparsity  based  face  recognition  algorithm 
[50]  and  FDDL  [49]  is  a  state-of-the-art  discrimina¬ 
tive  dictionaries  based  technique,  but  using  only  single 
modality.  Clearly,  the  SMBR  approach  achieves  about 
4  %  improvement  over  other  techniques.  Thus,  robust 
classihcation  using  multiple  modalities  results  in  a  signif¬ 
icant  improvement  over  the  current  benchmark.  Further, 
a  comparison  with  discriminative  methods  such  as  SLR 
and  SVM  shows  that  they  perform  poorly  compared  to 
the  proposed  method.  This  is  because  weak  modalities  are 
hard  to  discriminate,  hence  score-level  fusion  with  strong 
modality  does  not  improve  performance.  On  the  other 
hand,  by  appropriately  weighing  different  modalities, 
MKLFusion  achieves  better  result.  However,  by  impos¬ 
ing  reconstruction  and  joint  sparsity  simultaneously,  the 
proposed  method  is  able  to  achieve  superior  performance. 

•  Effect  of  noise:  In  this  experiment,  test  images  were  cor¬ 
rupted  with  white  Gaussian  noise  of  increasing  variance, 
tr^.  Comparisons  are  shown  in  Figure  10.  It  can  be  seen 
that  both  SMBR,  SR  and  Block  Sparse  methods  are  stable 
with  noise.  The  performance  of  other  algorithms  degrade 
sharply  with  noise  level.  This  also  highlights  the  problem 
with  MKLFusion,  as  it  is  not  robust  to  degradation  during 
testing. 

•  Effect  of  occlusion:  In  this  experiment,  a  randomly  cho¬ 
sen  block  of  the  test  image  was  occluded.  The  recognition 
performance  was  studied  with  increasing  block  size. 
Figure  11  shows  the  performance  of  various  algorithms 
with  block  size.  SMBR-E  is  the  most  stable  among  all 
the  methods  due  to  robust  handling  of  error.  Recognition 
rates  for  other  methods  fall  sharply  with  increasing  block 
size. 

•  Recognition  in  spite  of  disguise:  We  also  performed 
experiment  on  the  rest  of  the  AR  face  dataset,  occluded  by 
sun-glass  and  scarves.  Similar  to  the  above  experiment, 
7  frontal  non-occluded  images  per  subject,  from  the  hrst 
session,  were  used  for  training,  and  12  occluded  images 
per  person,  from  both  the  sessions  were  used  for  testing. 
Again  the  proposed  SMBR-WE  and  SMBR-E  methods 
outperformed  the  other  methods.  SMBR-E  method  gave 
the  best  performance,  improving  by  17.7%  over  the  Block 
Sparse  method. 


Method 

Scarves 

Sun-glass 

Overall 

SMBR-WE 

86.2 

36.0 

61.1 

SMBR-E 

80.0 

75.0 

77.5 

SR 

45.3 

52.3 

48.8 

Block  Sparse  [50] 

65.8 

53.8 

59.8 

SLR-Sum 

72.2 

39.6 

55.9 

SVM-Sum 

13.8 

42.5 

28.1 

MKLFusion 

47.7 

13.0 

30.3 

TABLE  IX:  Rank  one  performance  comparison  of  the  pro¬ 
posed  method. 

«  Quality  based  fusion:  Quality  determination  is  an  impor¬ 
tant  parameter  in  fusion  here,  as  a  strong  modality  is  be¬ 
ing  combined  with  weak  modalities.  We  studied  the  effect 
of  quality  measure  introduced  in  Section  III.  However,  in 
this  case  we  hx  the  quality  for  strong  modality,  viz.  whole 
face  to  be  1,  while  for  the  weak  modalities,  the  SCI  values 
were  taken.  The  recognition  performance  for  SMBR- 
E  and  SMBR-WE  across  different  noise  and  occlusion 
levels  was  studied.  Figure  12  show  the  performance 
comparison  with  the  unweighted  methods.  Using  quality, 
the  recognition  performance  for  SMBR-WE  goes  up  to 
97.4  %  from  96.9  %,  whereas  for  SMBR-WE  it  increases 
to  97  %  from  96  %.  Similarly,  results  improve  across 
different  noise  levels  for  both  methods.  However,  SMBR- 
WE  with  quality  shows  worse  performance  as  block  size 
is  increased.  This  may  be  because  it  does  not  handle 
sparse  error,  hence  the  quality  values  are  not  robust. 

VI.  Computational  Complexity 

The  proposed  algorithms  are  computationally  efficient.  The 
main  steps  of  the  algorithms  are  the  update  steps  for  T,  A, 
U  and  V.  For  linear  fusion,  the  update  step  for  E  involves 
computing  (X*  X*  -|-arl)~^  and  four  matrix  multiplications. 
The  hrst  term  is  constant  across  iterations  and  can  be  pre¬ 
computed.  Matrix  multiplication  for  two  matrices  of  sizes 
m  X  n  and  n  x  p  can  be  done  in  0{mnp)  time.  Hence, 
for  a  given  training  and  test  data,  the  computations  are 
linear  in  feature  dimension.  Hence,  large  feature  dimensions 
can  be  efficiently  handled.  Similarly,  update  step  for  A  in¬ 
volves  matrix  multiplication  X^EA  Update  steps  for  U  and 
V  involves  only  scalar  matrix  computations  and  are  very 
fast.  Similarly  in  the  kernel  fusion,  update  for  E  involves 
calculating  (Kxux*  +  which  can  be  pre-computed. 

Other  steps  are  similar  to  linear  fusion.  Classihcation  step 
involves  calculating  the  residual  error  for  each  class,  and  is 
efficient. 

VIE  Conclusion 

We  have  proposed  a  novel  joint  sparsity-based  feature  level 
fusion  algorithm  for  multimodal  biometrics  recognition.  The 


IEEE  TRANSACTIONS  ON  PATTERN  ANALYSIS  AND  MACHINE  INTELLIGENCE,  VOL.  X,  NO.  X,  MONTH  20XX 


12 


'ft  _  ^  1 


Rank  one  recognition  rate  across  noise  levels 


Fig.  10:  Effect  of  noise  on  recognition  performance. 


Fig.  11:  Effect  of  occlusion  on  recognition  performance. 


Method 

Recognition  Rate  {%) 

Method 

Recognition  Rate  (%) 

SMBR-WE 

96.9 

SVM-Sum 

86.7 

SMBR-E 

96 

SLR-Sum 

77.9 

SR 

91 

FDDL  [49] 

91.9 

Block  Sparse  [50] 

92.2 

MKLFusion 

89.7 

TABLE  'VIII:  Rank  one  performance  comparison  of  the  proposed  method. 


algorithm  is  robust  as  it  explicitly  includes  both  noise  and 
occlusion  terms.  An  efficient  algorithm  based  on  alternative  di¬ 
rection  was  proposed  for  solving  the  optimization  problem.  We 
also  proposed  a  multimodal  quality  measure  based  on  sparse 
representation.  Further,  the  algorithm  was  extended  to  handle 
non-linear  variations  through  kernel.  Various  experiments  have 
shown  that  our  method  is  robust  and  significantly  improves  the 
overall  recognition  accuracy. 
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